Evaluating Retail Recommender Systems via Retrospective Data: Lessons Learnt from a Live-Intervention Study
نویسندگان
چکیده
Performance evaluation via retrospective data is essential to the development of recommender systems. However, it is necessary to ensure that the evaluation results are representative of live, interactive behaviour. We present a case study of several common evaluation strategies applied to data from a live intervention. The intervention is designed as a case-control experiment applied to two cohorts of consumers (active and non-active) from an online retailer. This results in four binary hit rate indicators of live performance to compare with evaluation strategies applied to the same basket data as was available immediately prior to the recommendations being made, treating them as historical data. It was found that in this case none of the standard evaluation strategies predicted comparable binary hit rates to those observed during the live intervention. We argue that they may not sufficiently represent live, interactive behaviour to usefully guide system development with retrospective data. We present a novel evaluation strategy that consistently provides binary hit rates comparable to the live results, which seems to mirror the actual operation of the recommender more closely, paying particular attention to the principles and constraints that are expected to apply. Key Words—Recommender Systems, Performance Evaluation, Model Selection & Comparison, Business Applications, Lessons Learnt
منابع مشابه
Lessons learnt from errors in radiotherapy centers
Background: The purpose of this work is to discover and analyze errors and incidents in some radiotherapy centers, and to introduce methods that could reduce their occurrences, especially those which had happened due to the use of improper and inadequate equipment. This work is a first step toward clarifying the role of education in a risk-conscious culture, and changing the attitude of radioth...
متن کاملREFEREE: An Open Framework for Practical Testing of Recommender Systems using ResearchIndex
Automated recommendation (e.g., personalized product recommendation on an ecommerce web site) is an increasingly valuable service associated with many databases—typically online retail catalogs and web logs. Currently, a major obstacle for evaluating recommendation algorithms is the lack of any standard, public, real-world testbed appropriate for the task. In an attempt to fill this gap, we hav...
متن کاملHigh-level Synthesis: a Retrospective
High-level Synthesis or HLS represented an ambitious attempt by the community to provide capabilities for ‘algorithms to gates’ for a period of almost three decades. The technical challenge in realizing this goal drew researchers from various areas ranging from parallel programming, digital signal processing, and logic synthesis to expert systems. This article takes a journey through the years ...
متن کاملA New WordNet Enriched Content-Collaborative Recommender System
The recommender systems are models that are to predict the potential interests of users among a number of items. These systems are widespread and they have many applications in real-world. These systems are generally based on one of two structural types: collaborative filtering and content filtering. There are some systems which are based on both of them. These systems are named hybrid recommen...
متن کاملEvaluating Recommender Explanations: Problems Experienced and Lessons Learned for the Evaluation of Adaptive Systems
We describe the methodological considerations that arose over a series of experiments evaluating the effectiveness of explanations for recommendations. In particular, we look at issues relating to: criteria, metrics, product domain used, choice of materials, possible confounding factors, and approximation of experience versus real experience. We generalize the problems we found and the solution...
متن کامل